AITopics | learned function

The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

Neural Information Processing SystemsDec-25-2025, 10:33:27 GMT

We study the relationship between the frequency of a function and the speed at which a neural network learns it. We build on recent results that show that the dynamics of overparameterized neural networks trained with gradient descent can be well approximated by a linear system. When normalized training data is uniformly distributed on a hypersphere, the eigenfunctions of this linear system are spherical harmonic functions. We derive the corresponding eigenvalues for each frequency after introducing a bias term in the model. This bias term had been omitted from the linear network model without significantly affecting previous theoretical results. However, we show theoretically and experimentally that a shallow neural network without bias cannot represent or learn simple, low frequency functions with odd frequencies. Our results lead to specific predictions of the time it will take a network to learn functions of varying frequency. These predictions match the empirical behavior of both shallow and deep networks.

convergence rate, learned function, neural network, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.78)

Add feedback

Reviews: The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

Neural Information Processing SystemsJan-23-2025, 23:49:54 GMT

What functions do NNs learn (approximate a function) and how fast are central questions in the study of the dynamics of (D)NNs. A common conception behind this problem is that if one trains a network longer than necessary, then the model might overfit. However, the definition of overfitting appears to vary from paper to paper. Moreover, overfitting is intimately linked with another hot topic in the area: over-parametrization. Please refer to "Advani & Saxe 2017 High Dimensional Dynamics of Gen Error for NNs" for a modern take on this link. Keeping in mind this link, we focus on fixed-size networks.

convergence rate, different frequency, learned function, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.52)

Add feedback

Reviews: The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

Neural Information Processing SystemsJan-23-2025, 23:49:43 GMT

It finds that lower frequencies learn first, and finds that biases allow for learning of odd frequencies. The restriction to spherical data is limiting, but the analysis and conclusions (particularly the rates of convergence) are novel and interesting.

convergence rate, different frequency, learned function, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.83)

Add feedback

The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

Neural Information Processing SystemsOct-10-2024, 02:29:22 GMT

We study the relationship between the frequency of a function and the speed at which a neural network learns it. We build on recent results that show that the dynamics of overparameterized neural networks trained with gradient descent can be well approximated by a linear system. When normalized training data is uniformly distributed on a hypersphere, the eigenfunctions of this linear system are spherical harmonic functions. We derive the corresponding eigenvalues for each frequency after introducing a bias term in the model. This bias term had been omitted from the linear network model without significantly affecting previous theoretical results.

convergence rate, different frequency, learned function, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

Add feedback

How (Implicit) Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part II: the Multi-D Case of Two Layers with Random First Layer

Heiss, Jakob, Teichmann, Josef, Wutte, Hanna

arXiv.org Artificial IntelligenceMar-20-2023

Randomized neural networks (randomized NNs), where only the terminal layer's weights are optimized constitute a powerful model class to reduce computational time in training the neural network model. At the same time, these models generalize surprisingly well in various regression and classification tasks. In this paper, we give an exact macroscopic characterization (i.e., characterization in function space) of the generalization behavior of randomized, shallow NNs with ReLU activation (RSNs). We show that RSNs correspond to a Generalized Additive Model (GAM)-typed regression in which infinitely many directions are considered: the Infinite Generalized Additive Model (IGAM). The IGAM is formalized as solution to an optimization problem in function space for a specific regularization functional and a fairly general loss. This work is an extension to multivariate NNs of the results in [9], where we showed how wide RSNs with ReLU activation behave like spline regression under certain conditions and if the input dimension d = 1.

artificial intelligence, lemma, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2303.11454

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

Ronen, Basri, Jacobs, David, Kasten, Yoni, Kritchman, Shira

Neural Information Processing SystemsMar-18-2020, 22:18:18 GMT

We study the relationship between the frequency of a function and the speed at which a neural network learns it. We build on recent results that show that the dynamics of overparameterized neural networks trained with gradient descent can be well approximated by a linear system. When normalized training data is uniformly distributed on a hypersphere, the eigenfunctions of this linear system are spherical harmonic functions. We derive the corresponding eigenvalues for each frequency after introducing a bias term in the model. This bias term had been omitted from the linear network model without significantly affecting previous theoretical results.

artificial intelligence, different frequency, machine learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

Add feedback

Collaborating Authors

learned function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

Reviews: The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

Reviews: The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies

How (Implicit) Regularization of ReLU Neural Networks Characterizes the Learned Function -- Part II: the Multi-D Case of Two Layers with Random First Layer

The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies